Holistic Query Evaluation over Information Extraction Pipelines

نویسندگان

  • Ekaterini Ioannou
  • Minos N. Garofalakis
چکیده

We introduce holistic in-database query processing over information extraction pipelines. This requires considering the joint conditional distribution over generic Conditional Random Fields that uses factor graphs to encode extraction tasks. Our approach introduces Canopy Factor Graphs, a novel probabilistic model for effectively capturing the joint conditional distribution given a canopy clustering of the data, and special query operators for retrieving resolution information. Since inference on such models is intractable, we introduce an approximate technique for query processing and optimizations that cut across the integrated tasks for reducing the required processing time. Effectiveness and scalability are verified through an extensive experimental evaluation using real and synthetic data. PVLDB Reference Format: Ekaterini Ioannou, and Minos Garofalakis. Holistic Query Evaluation over Information Extraction Pipelines. PVLDB, 11(2): 217 229, 2017. DOI: 10.14778/3149193.3149201

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description

The CUNY-BLENDER team participated in the following tasks in TAC-KBP2010: Regular Entity Linking, Regular Slot Filling and Surprise Slot Filling task (per:disease slot). In the TAC-KBP program, the entity linking task is considered as independent from or a pre-processing step of the slot filling task. Previous efforts on this task mainly focus on utilizing the entity surface information and the...

متن کامل

Towards Holistic Web-Based Information Retrieval: An Agent-Based Approach

This paper presents an agent-based system for bolstering holistic information retrieval via the WWW. In Ellis’ holistic model of information seeking behaviors, the information seeking activities include: selection of sources, browsing and differentiating, monitoring as well as extraction. Through the use of a query processing agent (QPA), information filtering agents (IFAs) and information moni...

متن کامل

Experiences using F# for developing analysis scripts and tools over search engine query log data

We describe our experience using the programming language F# for analysis of text query logs from the Bing search engine. The goals of the project were to develop a set of scripts for enabling ad-hoc query analysis, clustering and feature extraction as well as to provide a subset of these within a data exploration tool developed for non-programmers. Where appropriate we describe programming pat...

متن کامل

Top-Down and Bottom-Up: A Combined Approach to Slot Filling

The Slot Filling task requires a system to automatically distill information from a large document collection and return answers for a query entity with specified attributes (‘slots’), and use them to expand the Wikipedia infoboxes. We describe two bottom-up Information Extraction style pipelines and a top-down Question Answering style pipeline to address this task. We propose several novel app...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2017